智能论文笔记

WavEnhancer: Unifying Wavelet and Transformer for Image Enhancement

Zinuo Li , Xuhang Chen , Chi-Man Pun , Shuqiang Wang

分类：计算机视觉

2022-12-16

Image enhancement is a technique that frequently utilized in digital image processing. In recent years, the popularity of learning-based techniques for enhancing the aesthetic performance of photographs has increased. However, the majority of current works do not optimize an image from different frequency domains and typically focus on either pixel-level or global-level enhancements. In this paper, we propose a transformer-based model in the wavelet domain to refine different frequency bands of an image. Our method focuses both on local details and high-level features for enhancement, which can generate superior results. On the basis of comprehensive benchmark evaluations, our method outperforms the state-of-the-art methods.

translated by 谷歌翻译

ShaDocNet: Learning Spatial-Aware Tokens in Transformer for Document Shadow Removal

Xuhang Chen , Xiaodong Cun , Chi-Man Pun , Shuqiang Wang

分类：计算机视觉

2022-11-30

Shadow removal improves the visual quality and legibility of digital copies of documents. However, document shadow removal remains an unresolved subject. Traditional techniques rely on heuristics that vary from situation to situation. Given the quality and quantity of current public datasets, the majority of neural network models are ill-equipped for this task. In this paper, we propose a Transformer-based model for document shadow removal that utilizes shadow context encoding and decoding in both shadow and shadow-free regions. Additionally, shadow detection and pixel-level enhancement are included in the whole coarse-to-fine process. On the basis of comprehensive benchmark evaluations, it is competitive with state-of-the-art methods.

translated by 谷歌翻译

Asymmetric Scalable Cross-modal Hashing

Wenyun Li , Chi-Man Pun

分类：计算机视觉

2022-07-26

跨模式哈希是解决大型多媒体检索问题的成功方法。提出了许多基于矩阵分解的哈希方法。但是，现有方法仍然在一些问题上遇到困难，例如如何有效地生成二元代码，而不是直接放松它们的连续性。此外，大多数现有方法选择使用$ n \ times n $相似性矩阵进行优化，这使得内存和计算无法承受。在本文中，我们提出了一种新型的不对称可伸缩式模式哈希（ASCMH）来解决这些问题。首先，它引入了集体矩阵分解，以从不同模态的内核特征中学习一个共同的潜在空间，然后将相似性矩阵优化转换为距距离距离差异问题，并借助语义标签和共同的潜在空间。因此，$ n \ times n $不对称优化的计算复杂性得到了缓解。在一系列哈希码中，我们还采用了标签信息的正交约束，这对于搜索准确性是必不可少的。因此，可以大大减少计算的冗余。为了有效的优化并可扩展到大规模数据集，我们采用了两步方法，而不是同时优化。在三个基准数据集上进行了广泛的实验：Wiki，Mirflickr-25K和NUS范围内，表明我们的ASCMH在准确性和效率方面表现出了最先进的跨模式散列方法。

translated by 谷歌翻译

Arbitrary Style Transfer with Structure Enhancement by Combining the Global and Local Loss

Lizhen Long , Chi-Man Pun

分类：计算机视觉

2022-07-23

任意样式转移生成了艺术图像，该图像仅使用一个训练有素的网络结合了内容图像的结构和艺术风格的结合。此方法中使用的图像表示包含内容结构表示和样式模式表示形式，这通常是预训练的分类网络中高级表示的特征表示。但是，传统的分类网络是为分类而设计的，该分类通常集中在高级功能上并忽略其他功能。结果，风格化的图像在整个图像中均匀地分布了样式元素，并使整体图像结构无法识别。为了解决这个问题，我们通过结合全球和局部损失，引入了一种新型的任意风格转移方法，并通过结构增强。局部结构细节由LapStyle表示，全局结构由图像深度控制。实验结果表明，与其他最新方法相比，我们的方法可以在几个常见数据集中生成具有令人印象深刻的视觉效果的更高质量图像。

translated by 谷歌翻译

Image Harmonization with Region-wise Contrastive Learning

Jingtang Liang , Chi-Man Pun

分类：计算机视觉

2022-05-27

Image harmonization task aims at harmonizing different composite foreground regions according to specific background image. Previous methods would rather focus on improving the reconstruction ability of the generator by some internal enhancements such as attention, adaptive normalization and light adjustment, $etc.$. However, they pay less attention to discriminating the foreground and background appearance features within a restricted generator, which becomes a new challenge in image harmonization task. In this paper, we propose a novel image harmonization framework with external style fusion and region-wise contrastive learning scheme. For the external style fusion, we leverage the external background appearance from the encoder as the style reference to generate harmonized foreground in the decoder. This approach enhances the harmonization ability of the decoder by external background guidance. Moreover, for the contrastive learning scheme, we design a region-wise contrastive loss function for image harmonization task. Specifically, we first introduce a straight-forward samples generation method that selects negative samples from the output harmonized foreground region and selects positive samples from the ground-truth background region. Our method attempts to bring together corresponding positive and negative samples by maximizing the mutual information between the foreground and background styles, which desirably makes our harmonization network more robust to discriminate the foreground and background style features when harmonizing composite images. Extensive experiments on the benchmark datasets show that our method can achieve a clear improvement in harmonization quality and demonstrate the good generalization capability in real-scenario applications.

translated by 谷歌翻译

Spatial-Separated Curve Rendering Network for Efficient and High-Resolution Image Harmonization

Jingtang Liang , Xiaodong Cun , Chi-Man Pun , Jue Wang

分类：计算机视觉

2021-09-13

图像协调旨在根据具体背景修改复合区域的颜色。以前的工作模型是使用Unet系列结构的像素-ID映像转换。然而，模型大小和计算成本限制了模型在边缘设备和更高分辨率图像上的能力。为此，我们首次提出了一种新的空间分离曲线渲染网络（S $ ^ 2 $ CRNET），首次进行高效和高分辨率的图像协调。在S $ ^ 2 $ CRNET中，我们首先将屏蔽前景和背景的缩略图中提取空间分离的嵌入物。然后，我们设计一种曲线渲染模块（CRM），其使用线性层学习并结合空间特定知识，以生成前景区域中的方向曲线映射的参数。最后，我们使用学习的颜色曲线直接渲染原始的高分辨率图像。此外，我们还通过Cascaded-CRM和语义CRM分别进行了两个框架的延伸，分别用于级联细化和语义指导。实验表明，与以前的方法相比，该方法降低了90％以上的参数，但仍然达到了合成的iHarmony4和现实世界DIH测试集的最先进的性能。此外，我们的方法可以在0.1秒内在更高分辨率图像（例如，2048美元\ times2048 $）上顺利工作，而不是所有现有方法的GPU计算资源。代码将在\ url {http://github.com/stefanleong/s2crnet}中提供。

translated by 谷歌翻译

Learning the shape of protein micro-environments with a holographic convolutional neural network

Michael N. Pun , Andrew Ivanov , Quinn Bellamy , Zachary Montague , Colin LaMont , Philip Bradley , Jakub Otwinowski , Armita Nourmohammad

分类：机器学习

2022-11-05

Proteins play a central role in biology from immune recognition to brain activity. While major advances in machine learning have improved our ability to predict protein structure from sequence, determining protein function from structure remains a major challenge. Here, we introduce Holographic Convolutional Neural Network (H-CNN) for proteins, which is a physically motivated machine learning approach to model amino acid preferences in protein structures. H-CNN reflects physical interactions in a protein structure and recapitulates the functional information stored in evolutionary data. H-CNN accurately predicts the impact of mutations on protein function, including stability and binding of protein complexes. Our interpretable computational model for protein structure-function maps could guide design of novel proteins with desired function.

translated by 谷歌翻译

Single-Stage Broad Multi-Instance Multi-Label Learning (BMIML) with Diverse Inter-Correlations and its application to medical image classification

Qi Lai , Jianhang Zhou , Yanfen Gan , Chi-Man Vong , Deshuang Huang

分类：计算机视觉 | 人工智能

2022-09-06

在许多现实世界中，可以通过多个实例（例如图像补丁）表示或描述一个对象（例如，图像），并同时与多个标签相关联。此类应用可以作为多标签学习（MIML）问题进行表述，并在过去几年中进行了广泛的研究。现有的MIML方法在许多应用中都是有用的，但是由于多个问题，大多数方法都遭受了相对较低的精度和训练效率的影响：i）忽略了标签间相关性（即，与对象相对应的多个标签之间的概率相关性）被忽略了； ii）由于缺失实例标签而导致的其他类型的相关性，无法直接（或共同）学习实体相关性； iii）只能在多个阶段学习各种相互关系（例如，标签间相关性，固定相关性）。为了解决这些问题，提出了一个新的单阶段框架，称为广泛的多标签学习（BMIML）。在BMIML中，有三个创新的模块：i）基于广泛学习系统（BLS）的自动加权标签增强学习（AWLEL）； ii）一个特定的MIML神经网络，称为可扩展的多构度概率回归（SMIPR）； iii）最后，交互式决策优化（IDO）。结果，BMIML可以同时学习单个阶段的整个图像，实例和标签之间的不同相互关系，以提高分类精度和更快的训练时间。实验表明，BMIML的准确性具有高度（甚至比现有方法）高度竞争，甚至比大多数MIML方法甚至更快，甚至对于大型医学图像数据集（> 90k图像）。

translated by 谷歌翻译

Sparse Bayesian Learning with Diagonal Quasi-Newton Method for Large Scale Classification

Jiahua Luo , Chi-Man Vong , Jie Du

分类：机器学习 | (统计)机器学习

2021-07-17

稀疏贝叶斯学习（SBL）构建了一个极其稀疏的概率模型，具有非常竞争力的泛化。但是，SBL需要将大型协方差矩阵与复杂性O（m ^ 3）（m：特征大小）反转，以更新正则化引脚，使得难以进行实际使用。 SBL中有三个问题：1）反转协方差矩阵可能在某些情况下获得奇异溶液，从而从收敛中阻碍SBL; 2）对高维特征空间或大数据尺寸的问题的可扩展性差; 3）SBL容易受到大规模数据的内存溢出。本文通过新提出的对角QuAsi-Newton（DQN）方法来解决DQN-SBL的新提出的对准Quasi-Newton（DQN）方法，其中忽略了大协方差矩阵的反转，使得复杂性和存储器存储减少到O（M）。使用不同大小的各种基准数据集，在非线性分类器和线性特征选择上进行彻底评估DQN-SBL。实验结果验证DQN-SBL是否通过非常稀疏的模型接收竞争泛化，并符合大规模问题。

translated by 谷歌翻译

Distributionally Robust Graph Learning from Smooth Signals under Moment Uncertainty

Xiaolu Wang , Yuen-Man Pun , Anthony Man-Cho So

分类：机器学习

2021-05-12

我们考虑从有限的嘈杂图形信号观察中学习图表的问题，其目标是找到图形信号的平滑表示。这种问题是通过在大型数据集中推断的关系结构，并且近年来广泛研究了这种问题。大多数现有方法专注于学习观察信号平滑的图表。但是，学习的图表容易过度拟合，因为它不会考虑未观察到的信号。为了解决这个问题，我们提出了一种基于分布稳健优化方法的新型图形学习模型，该模型旨在识别不仅提供了对观察信号中的不确定性的平滑表示的图表。在统计方面，我们建立了我们提出的模型的样本绩效保障。在优化方面，我们表明，在曲线图信号分布的温和假设下，我们提出的模型承认了平滑的非凸优化配方。然后，我们开发了一个预测的渐变方法来解决这一制定并建立其收敛保证。我们的配方在图形学习环境中提供了一个新的正则化视角。此外，综合和实世界数据的广泛数值实验表明，根据各种度量的观察信号的不同群体的模型具有比较不同的群体的较强的性能。

translated by 谷歌翻译